HITextracter System for Chemical and Gene/Protein Entity Mention Recognition in Patents

نویسندگان

  • Zengjian Liu
  • Xiaolong Wang
  • Buzhou Tang
  • Qingcai Chen
  • Xue Shi
  • Jiankang Hou
چکیده

In this paper, a hybrid system was proposed for chemical entity mention recognition (CEMP) and gene/protein related object recognition (GPRO) in BeCalm challenge. Firstly, five individual machine learning-based subsystems were developed to identify chemical and gene/protein related entity mentions, that is, a bidirectional LSTM (long-short term memory, a variant of recurrent neural network)-based subsystem without any manually-crafted feature, a bidirectional LSTM-based subsystem with some manually-crafted features, a bidirectional LSTM-based subsystem with orthographic features learning, a CRF (conditional random field)-based subsystem and a SSVM (structured support vector machine)-based subsystem. Then, an ensemble learning-based classifier was deployed to combine all the results predicted by above individual subsystems. Evaluation on the official test set showed that the best F1-scores achieved by our system are 90.37% on CEMP, 76.34% on CPRO type 1 respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HTSZ_CEM System for Chemical Entity Mention Recognition in Patents

In this paper, a machine learning-based system was proposed for the challenge task of chemical entity mention recognition in patents (CEMP) in BioCreative V. The CEMP task was recognized as a sequence labeling problem and conditional random fields (CRF) were employed for it. Evaluation on the CEMP challenge corpus showed that our system (team 293) achieved a micro F-measure of 87.03%.

متن کامل

Adapting ChER for the recognition of chemical mentions in patents

ChER (Chemical Entity Recogniser) is a pipeline of natural language processing tools optimised for the recognition of chemical names in scientific abstracts. It formed the basis of our submissions to the previous edition of the CHEMDNER track in BioCreative IV, and was one of the top-performing systems both for the chemical document indexing (CDI) and chemical entity mention recognition (CEM) s...

متن کامل

Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: the CEMP and GPRO patents tracks

This paper presents the results of the BioCreative V.5 offline tasks related to the evaluation of the performance as well as assess progress made by strategies used for the automatic recognition of mentions of chemical names and gene in running text of medicinal chemistry patent abstracts. A total of 21 teams submitted results for at least one of these tasks. The CEMP (chemical entity mention i...

متن کامل

DUTIR at the BioCreative V.5.BeCalm Tasks: A BLSTM-CRF Approach for Biomedical Entity Recognition in Patents

Patents contain the significant amount of information. Biomedical text mining has received much attention in patents recently, especially in the medicinal chemistry domain. The BioCreative V.5.BeCalm tasks focus on biomedical entities recognition in patents. This paper describes our method used to create our submissions to the Chemical Entity Mention recognition (CEMP) and Gene and Protein Rela...

متن کامل

Chemical entity recognition in patents by combining dictionary-based and statistical approaches

We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017